Skip to main content

View Post [edit]

Poster: molly Date: Mar 27, 2005 1:15pm
Forum: web Subject: Re: Robots archive - noarchive META tags

We respect robots.txt files, and more information about them can be found here on the web. robotstxt.org is a good place to start, and there are a few robots.txt generators out there.

Basically, create a text file called robots.txt in notepad or whatever text editor you use and stick this in it:

User-agent: ia_archiver
Disallow:

User-agent: *
Disallow: /

Basically this means ia_archiver can get to everything, but you are disallowing everybody else, starting at the top level of your directories. You can specifically name the other robots you want to disallow if you want. Stick this file in the top level of your website, and you are good to go!

Here are examples of some other site's robots.txt files:
archive.org/robots.txt
nytimes.com/robots.txt
cnn.com/robots.txt
bbc.co.uk/robots.txt
craigslist.org/robots.txt
slashdot.org/robots.txt

Reply [edit]

Poster: NoArchive Date: Mar 28, 2005 10:43am
Forum: web Subject: Re: Robots archive - noarchive META tags

Your reply is completely irrelevant. The question is about META tags, not robots.txt. Do you know what META tags are? Don't reply to questions you don't understand.

Reply [edit]

Poster: molly Date: Mar 29, 2005 7:00am
Forum: web Subject: Re: Robots archive - noarchive META tags

The Wayback Machine does adhere to properly formatted NoArchive META tags, but only for specific documents that include this tag. If the tag was added in 2002, pages before that will still appear. If you think otherwise, and would like to bring a specific example to our attention, please send an archival URL (http://web.archive.org/TIMESTAMP/URL) to info AT archive.org.

To block specific UserAgents while allowing access to others, robots.txt is a good choice. This will also allow you to control robot access from a central location instead of managing this tag in a large number of documents.

Thanks for using our services.

Reply [edit]

Poster: Igor Ranitovic Date: Mar 29, 2005 1:12am
Forum: web Subject: Re: Robots archive - noarchive META tags

NoArchive, you need to calm done a bit. It is nobdy's fault but yours that your posting is not clear. In the future follow one of these rules when posting on this forum:

- Courtesy never hurts -- it might help
- Make your posting simple to reply
- Be explicit about the question you have
- Follow up with the solution

Reply [edit]

Poster: simon c Date: Mar 28, 2005 12:43pm
Forum: web Subject: Re: Robots archive - noarchive META tags

No need to be rude, NoArchive. I believe what Molly is saying is that Wayback Machine archiving specifically references robots.txt, and not necessarily meta tags (though I'm sure someone can back me up on that, or not.)

Reply [edit]

Poster: Bob_Dratch Date: May 13, 2005 1:58am
Forum: web Subject: Re: Robots archive - noarchive META tags

Hi - wayback i was told also archives overseas in Egypt, i think Alexandria was mentioned; when i said i didn't want my old outdated COPYRIGHTED PAGES that specifically say, NO COPYING OR ARCHIVING (and this "service" ripped them off), they backed such up over there, and said, well, YOU are gonna have to tell them TOO to take ur pages away if you don't want us re-publishing YOUR COPYRIGHTED works without your permission. I highly suggest a CLASS action suit be brought out against these archivers who take stuff without a copyright owner's permission. I've seen MANY commercial pages republished by this "service" without the owner's permission. It's not up to the copyright owner to tell these people to remove stolen material. The GLOBAL copyright protection laws protect the copyright owner - groups like this are not honoring copyright law, and really need to be addressed legally. There is nothing noble about a thief.
This post was modified by Bob_Dratch on 2005-05-13 08:58:15

Reply [edit]

Poster: molly Date: May 13, 2005 2:16am
Forum: web Subject: Re: Robots archive - noarchive META tags

Hello Bob,
We'd be happy to remove any of your pages from the Wayback Machine. Please email us at info@archive.org.

-best
Molly

Reply [edit]

Poster: smartsight Date: Jun 16, 2006 4:56am
Forum: web Subject: Re: Robots archive - noarchive META tags

I wish to prevent caching of my web pages by all search engines, but still allow archive.org to cache my pages. My understanding is that this is not currently possible. I now use META NAME="ROBOTS" CONTENT="NOARCHIVE" on my pages to keep search engines from showing cached pages (while still allowing them to index my pages), but I would like to add some equivalent of META NAME="IA_ARCHIVE" CONTENT="ARCHIVE" to explicitly make an exception for caching by archive.org. Will you be providing an option for this anytime soon? (Note that robots.txt does not let one make any distinction between caching and indexing, so that does not help either.) Thanks.
This post was modified by smartsight on 2006-06-16 11:55:18
This post was modified by smartsight on 2006-06-16 11:56:14

Reply [edit]

Poster: ellenlangsetmo Date: Jul 31, 2007 10:37am
Forum: web Subject: Re: Robots archive - noarchive META tags

i am responding to this because the crawler listed this under bob dratch. i think a solution to your troubles in general is this. allow the crawleres including the comercial ones to crawl and also the virtuoal librery archives to crawl. dont use drmanigment, ehxclusion, ehrohr fouhr zero tree forbition sehrver mhessiges etc. instead make a virtual book in a web server for free. then sell a non copyprotected cd for the cost of your other music cds. make these cds available at that source that sell books and sacred music. also have a way that people can buy cds from your place or bussiness. on this cd i think you should sell is the cericulion of that old mystery school and the new seminer to and you should make your semineres accessable and sell epuyptment to. make your busssiness an eqell oppertunity bussiness. by keeping people satiasfied at a fair price you create freindly geustes. in other wordes treat otheres the way you want to be treated and dont be such a screetch